Online Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”
نویسندگان
چکیده
As discussed in Section 3 of the main paper, the KG policy posseses several optimality and convergence properties. First, it is optimal by construction when N = 1 (Remark 1). Second, the suboptimality gap between the values of the KG and the optimal policies narrows to 0 as N →∞ (Theorem 4). This is a convergence result, since it shows that when sampling under the KG policy we are guaranteed to eventually discover the alternative that is truly best. Third, the suboptimality gap is bounded for N between these two extremes (Theorem 5). Here, we discuss and prove these latter two results, discussing the convergence result in Section A.2, and the general bound on suboptimality in Section A.3. These results extend those proved in Frazier et al. (2008) for independent normal priors.
منابع مشابه
The knowledge gradient algorithm for online learning
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. The resulting decision rule easily extends to a variety of settings, including the case where our prior beliefs about the rewards are correlated. Experiments show that the KG policy performs competitively against other learning policies in diverse situations. In the ca...
متن کاملThe Knowledge-Gradient Policy for Correlated Normal Beliefs
We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives’ mean values, algorithms may utilize this dependence to perform efficiently even when the number of alternatives is very large. We...
متن کاملThe Knowledge Gradient Algorithm for a General Class of Online Learning Problems
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the opti...
متن کاملThe effect of language complexity and group size on knowledge construction: Implications for online learning
This study investigated the effect of language complexity and group size on knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the number of unique words by total words. It refers to the lexical variation. The results showed that...
متن کاملOptimal learning for sequential sampling with non-parametric beliefs
We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weigh...
متن کامل